<?php
 include "counter.php";

pagestart();
?>
<html>
<?php pagehead("Disrespectful Grubbing") ?>
<BODY bgcolor="white">
<?php pagetop("The Grub Robots.txt problem") ?>

On April 23, 2003, a strange problem was noted.
<p>
A number of accesses were made to a script called "display-person" on
the counter. This script displays information about a person,
including his or her email address.
<p>
Because this is sensitive data (spammers NOT welcome), it is protected
in various ways. But these accesses seemed to be coming from all over
the place, not just from one site or subnet. (That's happened before).
<p>
A frantic hour of log investigation found the culprit: The <a
href="http://grub.org">Grub project</a>, a distributed indexing
program modelled on that earliest of spiders, the <a href="http://www.searchtools.com/tools/harvest.html">Harvest project</a> -
allowing a multitude of spiders to collect data and share the
resulting index.
<p>
Unfortunately, the project did not respect <a
href="http://www.robotstxt.org/wc/exclusion.html">robots.txt</a>,
which plainly warned all such spidering efforts against accessing this
particular URL. Investigation of the Grub site revealed that the
project is quite aware that there is a problem, but is waffling,
claiming that it's already solved, or doesn't need to be solved.
<p>
<b>Nonsense.</b>
<p>
The Grub project is breaking the rules, and they know it.
<p>

So far, there has been more than 20.000 Grub accesses overall to the
counter - more than 3000 of them definitely violating the robots.txt
rules in the 15 hours since I started counting the rule-breaking
accesses.
<p>
Unusually for an opensource project, the web site offers zero (none,
nada) email addresses to complain to - and I have no desire to become
a participant in their forum system; I have enough of those.
<p>
You can follow the progress of the various attempts to scan the
counter at the <a href="/reports/malusstatus.php">Malus status</a>
page. When the Grub count stops increasing, and the date of last
access stops advancing, I'll believe that they've fixed the bug.
<p>
Until then - don't expect me to believe them.
<p>
UPDATE: at April 24, 17:03 GMT, the grubbing seems to have stopped.
<p>
UPDATE: the respite was temporary. On May 3, at 3 in the morning, the grub returned to its
erroneous ways.
<p>
UPDATE (Jan 2004): After May, the problem mostly disappeared. The last
hit from Grub was seen on September 9, 2003. Apparently, the problem
has been fixed permanently now.
<p>
Harald Alvestrand, for the Linux Counter Project

<?php pagebottom("yes") ?>
</body>
</html>


